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Abstract. Uncertain information is commonplace in real-world data 
management scenarios. The ability to represent large sets of possible 
instances (worlds) while supporting efflcient storage and processing is 
an important challenge in this context. The recent formalism of world- 
set decompositions (WSDs) provides a space-efficient representation for 
uncertain data that also supports scalable processing. WSDs are com- 
plete for finite world-sets in that they can represent any finite set of 
possible worlds. For possibly infinite world-sets, we show that a natu- 
ral generalization of WSDs precisely captures the expressive power of 
c-tables. We then show that several important problems are efficiently 
solvable on WSDs while they are NP-hard on c-tables. Finally, we give 
a polynomial-time algorithm for factorizing WSDs, i.e. an efficient algo- 
rithm for minimizing such representations. 



1 Introduction 

Recently there has been renewed interest in incomplete information databases. 
This is due to the many important appUcations that systems for representing in- 
complete information have, such as data cleaning, data integration, and scientific 
databases. 

Strong representation systems |19l3ll8j are formaUsms for representing sets 
of possible worlds which are closed under query operations in a given query 
language. While there have been numerous other approaches to dealing with 
incomplete information, such as closing possible worlds semantics using certain 
answers (117112] , constraint or database repair [IS.IO.Q, , and probabilistic ranked 
retrieval [1414] . strong representation systems form a compositional framework 
that is minimally intrusive by not requiring to lose information, even about 
the lack of information, present in an information system: Computing certain 
answers, for example, entails a loss of possible but uncertain information. Strong 
representation systems can be nicely combined with the other approaches. For 
example, data transformation queries and data cleaning steps effected within a 
strong representation systems framework can be followed by a query with ranked 
retrieval or certain answers semantics, closing the possible worlds semantics. 



* This article is an extended version of the paper :6 that has appeared in the Pro- 
ceedings of the International Conference on Database Theory (ICDT) 2007. 



The so-called c-tables [19116117] are the prototypical strong representation 
system. However, c-tables are not well suited for representing large incomplete 
databases in practice. Two recent works presented strong, indeed complete, rep- 
resentation systems for finite sets of possible worlds. The approach of the Trio 
x-relations [8] relies on a form of intensional information ( "lineage" ) only in com- 
bination with which the formalism is strong. In [5^ large sets of possible worlds 
are managed using world-set decompositions (WSDs). The approach is based 
on relational product decomposition to permit space-efficient representation. [S] 
describes a prototype implementation and shows the efficiency and scalability of 
the formalism in terms of storage and query evaluation in a large census data 
scenario with up to 2^" worlds, where each world stored is several GB in size. 

Examples of world-set decompositions. As WSDs play a central role in 
this work, we next exemplify them using two manually completed forms that 
may originate from a census and which allow for more than one interpretation 
(Figure [l}. For simplicity we assume that social security numbers consist of only 
three digits. For instance. Smith's social security number can be read either as 
"185" or as "785" . We can represent the available information using a relation 
in which possible alternative values are represented in set notation (so-called 
or-sets) : 
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This or-set relation represents 2 • 2 • 2 • 4 = 32 possible worlds. 

We now enforce the integrity constraint that all social security numbers be 
unique. For our example database, this constraint excludes 8 of the 32 worlds, 
namely those in which both tuples have the value 185 as social security number. 
This constraint excludes the worlds in which both tuples have the value 185 as 
social security number. It is impossible to represent the remaining 24 worlds 
using or-set relations. What we could do is store each world explicitly using a 
table called a world-set relation of a given set of worlds. Each tuple in this table 
represents one world and is the concatenation of all tuples in that world (see 
Figure [J). 

A world-set decomposition is a decomposition of a world-set relation into 
several relations such that their product (using the product operation of rela- 
tional algebra) is again the world-set relation. The world-set represented by our 
initial or-set relation can also be represented by the product 
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In the same way we can represent the result of data cleaning with the unique- 
ness constraint for the social security numbers as the product 



Social Security Number: 



Marital status: (1) single B»{2) married <^ 
(3) divorced □ (4) widowed □ 



Social Security Number: 

Name: l?^rQt^h 



Marital Status: (i) single □ (2) married □ 
(3) divorced □ (4) widowed □ 
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Fig. 1. Two completed survey forms and a world-set relation representing the 
possible worlds with unique social security numbers. 
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One can observe that the result of this product is exactly the world-set rela- 
tion in Figure [1] The decomposition is based on the independence between sets 
of fields, subsequently called components. Only fields that depend on each other, 
for example ti.S and t2.S, belong to the same component. Since {ti.S, t2.S} and 
{ti.M} are independent, they are put into different components. 

WSDs can be naturally viewed as c-tables whose formulas have been put into 
a normal form represented by the component relations. The following c-table 
with global condition (/> is equivalent to the WSD with our integrity constraint 
enforced. 
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Formal definitions of WSDs and c-tables will be given in the body of this 
article. 



Contributions. The main goal of this work is to develop expressive yet effi- 
cient representation systems for infinite world-sets and to study the theoretical 
properties (such as expressive power, complexity of query-processing, and mini- 
mization) of these representation systems. Many of these results also apply to - 
and are new for - the world-set decompositions of [5]. 
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Table 1. Decision Problems for Representation Systems. 



In TS], a strong argument is made supporting c-tables as a benchmark for 
the expressiveness of representation systems; we concur. Concerning efhcient 
processing, we adopt a less expressive syntactic restriction of c-tables, called 
v-tables |19|3j . as a lower bound regarding succinctness and complexity. The 
main development of this article is a representation system that combines, in 
a sense, the best of all worlds: (1) It is just as expressive as c-tables, (2) it is 
exponentially more succinct than unions of v-tables, and (3) on most classical 
decision problems, the complexity bounds are not worse than those for v-tables. 

In more detail, the technical contributions of this article are as follow^: 

— We introduce gWSDs, an extension of the WSD model of [5j with variables 
and possibly negated equality conditions. 

— We show that gWSDs are expressively equivalent to c-tables and are there- 
fore a strong representation system for full relational algebra. 

— We study the complexity of the main data management problems |3|19j 
regarding WSDs and gWSDs, summarized in Table [H Table [2] compares the 
complexities of these problems in our context to those of existing strong 
representation systems like the well-behaved ULDBs of Tric0 and c-tables. 

— We present an efficient algorithm for optimizing gWSDs, i.e., for computing 
an equivalent gWSD whose size is smaller than that of a given gWSD. In 
the case of WSDs, this is a minimization algorithm that produces the unique 
maximal decomposition of a given WSD. 

One can argue that gWSDs are a practically more applicable representation 
formalism than c-tables: While having the same expressive power, many impor- 
tant problems are easier to solve. Indeed, as shown in Table [2l the complexity 
results for gWSDs on many important decision problems are identical to those 
for the much weaker v-tables. At the same time WSDs are still concise enough 
to support the space-efRcient representation of very large sets of possible worlds 

^ This article extends with proofs, a modified algorithm for relational factorization 
with better space complexity, and new data complexity results for tuple q-possibility, 
tuple q-certainty, and instance q-certainty, where g is a full or positive relational 
algebra query. 

* The complexity results for Trio are from ^ and were not verified by the authors. 
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Table 2. Comparison of data complexities for standard decision problems. 



(cf. the experimental evaluation on WSDs in 5J). Also, while gWSDs are strictly 
stronger than Trio representations, which can only represent finite world-sets, 
the complexity characteristics are better. 

The results on finding maximal product decompositions relate to earlier 
work done by the database theory community on relational decomposition given 
schema constraints (cf. e.g. J2j). Our algorithms do not assume such constraints 
and only take a snapshot of a database at a particular point in time into consider- 
ation. Consequently, updates may require to alter a decomposition. Nevertheless, 
our results may be of interest independently from WSDs as for instance in cer- 
tain scenarios with very dense relations, decompositions may be a practically 
relevant technique for efficiently storing and querying large databases. 

Note that we do not consider probabilistic approaches to representing uncer- 
tain data (e.g. the recent work [M]) in this article. However, there is a natural and 
straightforward probabilistic extension of WSDs which directly inherits many of 
the properties studied in this article, see [S]- 

The structure of the article basically follows the list of contributions. 

2 Preliminaries 

We use the named perspective of the relational model and relational algebra 
with the operations selection cr, projection tt, product x, union U, difference — , 
and renaming S. 

A relation schema is a construct of the form R[U], where i? is a relation 
name and U is b, nonempty set of attribute namesl3 Let D be an infinite set of 
atomic values, the domain. A relation over schema R[Ai, . . . , Ak] is a finite set 
of tuples {Ai : ai, . . . , Ak : Ok) where ai, . . . ,ak G D. A relational schema is a 
tuple U = (i?i[[/i], . . . , Rk[Uk]) of relation schemas. A relational structure (or 
database) A over schema is a tuple {R'f, . . . , R'k^), where each Rf is a relation 
over schema Ri[Ui]. When no confusion may occur, we will also use R rather 
than R-^ to denote one particular relation over schema R[U]. For a relation R, 

^ For technical reasons involving the WSDs presented later, we exclude nuUary rela- 
tions and will represent these (e.g., when obtained as results from a Boolean query) 
using unary relations over a special constant "true" . 



sch(i?) denotes the set of its attributes, ar{R) its arity and \R\ the number of 
tuples in R. 

A set of possible worlds (or world-set) over schema S is a set of databases 
over schema S. Let W be a set of finite structures, and let rep be a function 
that maps each W € W to a world-set of the same schema. Then (W, rep) is 
called a strong representation system for a query language if, for each query Q 
of that language and each W G W such that the schema of Q is consistent with 
the schema of the worlds in rep{W), there is a structure W' G W such that 
repiW') = {Q{A) \ A G rep{W)}. 

2.1 Tables 

We now review a number of representation systems for incomplete information 
that are known from earlier work (cf. e.g. |17l2j ). 

Let X be a set of variables. We call an equality of the form x = c or x = y, 
where x and y are variables from X and c is from D an atomic condition, and 
will define (general) conditions as Boolean combinations (using conjunction, 
disjunction, and negation) of atomic conditions and the constant "true" . 

Definition 1 (c-table). A c-multitahle |19I17| over schema (i?i[J7i], . . . , Rk[Uk\) 
is a tuple 

T=(i?f,...,i?^,0^,A^) 

where each Rj is a set of ar(i?i )-tuples over DUX, cjp' is a Boolean combination 
over equalities on D U X called the global condition, and function assigns 
each tuple from one of the relations i?^, . . . , i?^ to a condition (called the local 
condition of the tuple). A c-multitable with = 1 is called a c-table. 

The semantics of a c-multitable T, called its representation rep{T), is defined 
via the notion of a valuation of the variables occurring in T (i.e., those in the 
tuples as well as those in the conditions). Let : X — » D be a valuation that 
assigns each variable in T to a domain value. We overload v in the natural way 
to map tuples and conditions over D U X to tuples and formulas over 
satisfaction of T is a valuation such that i'{(jy^) is true. A satisfaction v takes 
T to a relational structure ^{T) ~ (i?^'^', . . . , R^j}'^^) where each relation i?^*'^'' 

is obtained as '■— {v{t) \ t G R[ A v(\'^ {t)) is true}. The representation of 

T is now given by its satisfactions, rep{T) :— {i^iT) | is a satisfaction of T}. 

□ 

Example 1. Section [T] gives a c-table T representing our uncertain census data 
of Figure [TJ T uses one variable per uncertain field and lists the possible values 
of the variables in the global condition 0. Each satisfaction of T defines a world 
and there are 24 such worlds. The local conditions in T are "true" and omitted. 

FigurelHKa) shows a c-table T, where both tuples have local conditions. T has 
infinitely many satisfactions and thus defines an infinite world-set. For example, 
the satisfaction {x ^ 2,y ^ 1, z ^ 2} defines the world A with relation = 



Done by extending to be the identity on domain values and to commute with the 
tuple constructor, the Boolean operations, and equality. 



{v{{A ■.x,B:l))\ u{x ^ 2) is true )} U {u{{A : z,B ■.y))\ u{y ^ 2) is true )} = 
{(^: 2,5:1)}. □ 

Proposition 1 (|19|). The c-multitables are a strong representation system for 
relational algebra. 

We consider two important restrictions of c-multitables. 

1. By a g-multitable [3], we refer to a c-multitable in which the global condition 
^■^is a conjunction of possibly negated equalities and maps each tuple to 
"true" . 

2. A v-multitable is a g-multitable in which the global condition (j)^ is a con- 
junction of equalities. 

Without loss of generality, we may assume that the global condition of a 
g-multitablc is a conjunction of negated equalities and the global condition of a 
v-multitable is simply "true" Q Subsequently, we will always assume these two 
normal forms and omit local conditions from g-multitables and both global and 
local conditions from v-multitables. 
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Fig. 2. A g-multitable T (a), possible world A (b), and a valuation s.t. i^(T) = A 
(c). 

Example 2. Consider the g-multitable T = [R'^ , S'^ , (fP^ ) of Figure [2](a). Then 
the valuation of Figure [5] (c) satisfies the global condition of T, as v{x) ^ vijj). 
Thus A e rep{T), where A is the structure from Figured] (b). □ 

Remark 1. It is known from [TH] that v-tables are not a strong representation 
system for relational selection, but for the fragment of relational algebra built 
from projection, product, and union. 

The definition of c-multitables used here is from |17j . The original definition 
from |19j has been more restrictive in requiring the global condition to be "true" . 
While c-tables without a global condition are strictly weaker (they cannot repre- 
sent the empty world-set), they nevertheless form a strong representation system 
for relational algebra. 

In [2], the global conditions of c-multitables are required to be conjunctions 
of possibly negated equalities. It will be a corollary of a result of this paper 
(Theorem [5]) that this definition is equivalent to c-multitables with arbitrary 
global conditions. □ 

^ Each g-multitable resp. v-multitable can be reduced to one in this normal form by 
variable replacement and the removal of tautologies such as a; = a:: or 1 = 1 from the 
global condition. 



We next define a restricted form of c-tables, called mutex-tables (or x-tables 
for short). This formalism is of particular importance in this paper as it is closely 
related to gWSDs, our main representation formalism. An x-table is a c-table 
where the global condition is a conjunction of negated equalities and the local 
conditions are conjunctions of equalities and a special form of negated equalities. 
We make this more precise next. 

Consider a set of variables Y and a function /i : Y i-^ mapping variables 
to positive numbers. The mutex set M(Y, /i) for Y and /i is defined by 

{ "true"} U {(.T = i) I a; e Y, 1 < i < ^(x)} U{(a;^lA...Aa;7^ ^{x)) \xeY}. 

Intuitively, M defines for each variable of Y possibly negated equalities such that 
a variable valuation satisfies precisely one of these conditions. 

Definition 2 (x-table). An x-multitable is a c-multitable 

where (1) the global condition (p'^ is a conjunction of negated equalities, (2) all 
local conditions defined by A"^ are conjunctions over formulas from a mutex set 
M(Y,/i) and equalities over X U D, and (3) the variables in Y do not occur in 
, . . . , RJ , (jP' . An x-multitable with fc = 1 is called an x-table. □ 



Example 3. Figure 5(b) shows an x-table T over the mutex set M(Y, /z) where 
Y = {xi\ and ^{x) — 1. The mutex conditions on xi are used to state that in- 
stantiations of the first tuple cannot occur in the same worlds with instantiations 
of the last two tuples. 

Figure [7I^b) shows an x-multitable T over a mutex set with Y = {a;i,a;3} 
and /i(a;i) = /^(a^s) = 1. The mutex conditions on xi are used to state that 
instantiations of the first two tuples of R and of the first tuple of S cannot occur 
in the same worlds with instantiations of the third tuple of R and the second 
tuple of S. For example, the satisfaction {xi i-^ 2, ^3 2, y ^ 3, z i-^ 4} of T 
defines the world A with R-^ ^ {{A : 2), {A : 1)} and = {{B : 2)}, whereas 
the satisfaction {xi i— > 1,0:3 t— > l,y ^ 3,z 4} defines the world B with 
R^ = {{A : 2), {A : 3), {A : 1)} and S*^ = {{B : 4), {B : 1)}. □ 

It will be a corollary of joint results of this paper (Lemma[l]and Theorem[2]) 
that x-multitables are as expressive as c-multitables. 

Proposition 2. The x-multitables capture the c-multitables. 

This result implies that x-multitables are a strong representation system for 
relational algebra. In this paper, however, we will make particular use of a weaker 
form of strongness, namely for positive relational algebra, in conjunction with 
efficient query evaluation. 



Proposition 3. The x-multitables are a strong representation system for pos- 
itive relational algebra. The evaluation of positive relational algebra queries on 
x-multitables has polynomial data complexity. 



Proof. Wc use the algorithm of |19I17| for the evaluation of relational algebra 
queries on c-multitables and obtain an answer c-multitable of polynomial size. 
Consider a fixed positive relational algebra query Q, c-multitable T, and c-table 
T', where T' represents the answer to Q on T. We compute T' by recursively 
applying each operator in Q. The evaluation follows the relational case except 
for the computation of global and local conditions (which do not exist in the 
relational case). The global condition of T becomes the global condition of T'. 
For projection and union, tuples preserve their local conditions from the input. 
In case of selection, the local condition of a result tuple is the conjunction of the 
local condition of the input tuple and, if required by the selection condition, of 
new equalities involving variables in the tuple and constants from the positive 
selections of Q. In case of product, the local condition of a result tuple is the 
conjunction of the local conditions of the constituent input tuples. 

The local conditions in T' are thus conjunctions of local conditions of T and 
possibly additional equalities. In case T is an x-table, then its local conditions 
are conjunctions over formulas from a mutex set M and further equalities. Thus 
the local conditions of T' are also conjunctions over formulas from M and further 
equalities. T' is then an x-table. □ 

3 New Representation Systems 

This section introduces novel representation systems beyond those surveyed in 
the previous section. We start with finite sets of v(g-,c-)tables, or tabsets for 
short, then show how to inline tabsets into tabset-tables, and finally introduce 
decompositions of such tabset-tables based on relational product. Such decom- 
positions are our main vehicle for representing incomplete data and the next 
sections are dedicated to their expressiveness and efficiency. 

3.1 Tabsets and Tabset Tables 

We consider finite sets of multitables as representation systems, and will refer 
to such constructs as tabsets (rather than as multitahle-sets, to be short). 

A g-(resp., v-)tabset T = {7i, . . . , Tn} is a finite set of g-(v-)multitables. The 
representation of a tabset is the union of the representations of the constituent 
multitables, 

rep(T) repiTi) U • • • U rep{Tn). 

Note that finite sets of v-multitables are more expressive than v-multitables: 
v-tabsets can trivially represent any finite world-set with one v-multitable rep- 
resenting precisely one world. It is known [2\ that no v-multitable can represent 
the world-set consisting of an empty world and a non-empty world, as produced 
by, e.g., selection queries on v-multitables. 

We next construct an inlined representation of a tabset as a single table by 
turning each multitable into a single tuple. 

Let A be a g-tabset over schema S. For each R\U] in S, let |i?|max = 
maxjli?-^! : A G A} denote the maximum cardinality of R in any multitable 



of A. Given a g-multitable ^ G A with R-^ = {ti, . . . ,t^j^A\}, let inline(i?'^) be 
the tuple obtained as the concatenation (denoted o) of the tuples of R-^ padded 
with a special tuple t±_ up to arity |-R|max, 

inline(i?-^) := ti o ■ ■ ■ o t^j^A\ o (tj_, , tj_), where tj_ = (_L, . . . , -L) 

fl|max~|-R-^| ar{R) 

Then tuple 

inhne(^) inhne(i?5^) o • • ■ o inhne(i?|^|) 

encodes all the information in A. 

We make use of the symbol ± to align the g-tables of different sizes and 
uniformly inline g-tabsets. Given a g-multitable A padded with additional tuples 
t±, there is no world represented by inline(^) that contains instantiations of 
these tuples. We extend this interpretation and generally define as t± any tuple 
that has at least one symbol _L, i.e., {Ai : ai, . . . , An '■ a„), where at least one at 
is _L, is a t± tuple. This allows for several different inlinings that represent the 
same world-set. 

Definition 3 (gTST). Given an inlining function inline, a (7-ta&set toWe (gTST) 
of a g-tabset A is the pair (W, A) consisting of the tablcl W = {inhne(^) | A S 
A} and the function A which maps each tuple inline (^) of W to the global 
condition of A. □ 

A vTST (TST) is obtained in strict analogy, omitting A (A and variables). 

To compute inline(i?'^), we have fixed an arbitrary order of the tuples in R-^. 
We represent this order by using indices di to denote the i-th tuple in R-^ for 
each g-multitable A, if that tuple exists. Then the TST has schema 

{R.d^.Aj I R[U] mSA<^< |i?|max,v4j G U}. 

Example 4- An example translation from a tabset to a TST is given in Figure |3l 

The semantics of a gTST {W, A) as a representation system is given in strict 
analogy with tabsets, 

rep{W,X) := |J{rep(inline~^(t), A(i)) \ teW}. 

Remark 2. Computing the inverse of "inline" is an easy exercise. In particular, 
we map inline(i?-^) to R-^ as 

(ai, . . . , aar(i?,)-|_R|„ax) {{0'ar{R)-k+l, ■ ■ ■ i 0,ar{R)-(k+l)) | < fc < |i?|,nax, 

aar[R)-k+l ^ -L, . • . , a'ar(R)-{k+l) -•-}• 

By construction, the TSTs capture the tabsets. 
Proposition 4. The gTSTs (resp., vTSTs) capture the g-(v-)tabsets. 
* Note that this table may contain variables and occurrences of the 1. symbol. 
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(b): TST of tabset {yl,B,C}. 
Fig. 3. Translation from a tabset (a) to a TST (b). 



Finally, there is a noteworthy normal form for gTSTs. 

Proposition 5. The gTST in which A maps each tuple to a common global 
condition (j) unique across the gTST, that is, \ : ■ ^ (p, capture the gTST. 

Proof. Given a g-tabset A, we may assume without loss of generality that no 
two g-multitables from A share a common variable, either in the tables or the 
conditions, and that all global conditions in A arc satisfiable. (Otherwise we 
could safely remove some of the g-multitables in A.) But, then, <f) is simply the 
conjunction of the global conditions in A. For any tuple t of the gTST of A, the 
g-multitable (inline~"'^(i), 0) is equivalent to (inline^"'^(i), A(t)). □ 

Proviso. We will in the following write gTSTs as pairs {W,(j)), where W is the 
table and 4> is a single global condition shared by the tuples ofW. 

3.2 World-set Decompositions 

We are now ready to define world-set decompositions, our main vehicle for effi- 
cient yet expressive representation systems. 

A product m- decomposition of a relation i? is a set of non-nuUary relations 
{Cl, . . . , Cm} such that Ci x • • • x Cm = R- The relations Ci, . . . , Cm are called 
components. A product m-decomposition of R is maximally decomposed) if there 
is no product n-decomposition of R with n > m. 

Definition 4 (attribute-level gWSD). Let {W, </>) be a gTST. Then an attri- 
bute-level world-set m-decomposition (m-gWSD) of (W, (j)) is a pair of a product 
m-decomposition of W together with the global condition 0. □ 

We also consider two important simplifications of gWSDs, those without 
global condition, called vWSDs, and vWSDs without variables, called WSDs. 
An example of a WSD is shown in Figure [H 



R 


A B 


R 


AB R 


AB R 




1 2 


di 


1 2 di 


3 4 di 


d2 


5 6 




d2 


5 6 



3 4 



1 2 X 



7?.d2.A _R.d2.S 



5 



6 



Fig. 4. Set of four worlds and a corresponding 2-WSD. 
The semantics of a gWSD is given by its exact correspondence with a gTST, 
rep ({Ci, . . . , Cm}, := rep (Ci x • • • x C^, <P) ■ 
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To decompose W, we treat its variables and the _L-value as constants. Clearly, 
the g-tabset A and any gWSD of A represent the same set of possible worlds. 
It immediately follows from the definition of WSDs that 

Proposition 6. Any finite set of possible worlds can be represented as a 1-WSD. 

Corollary 1. WSDs are a strong representation system for any relational query 
language. 

In the case of infinite world-sets, however, the mere extension of WSDs with 
variables and equalities does not suffice to make them strong. The lack of power 
to express negated equalities, despite the ability to express disjunction, keeps 
vWSDs (and thus equally v-tabsets) from being strong in the case of infinite 
world-sets. 

Proposition 7. vWSDs are a strong representation system for projection, prod- 
uct and union, and are not a strong representation system for selection and 
difference. 

Proof. We show that v-tabsets are a strong representation system for projection, 
product and union but not for selection and difference. From the equivalence of 
v-tabsets and vWSDs (each v-tabset is a 1-vWSD) the property also holds for 
vWSDs. 

Let T = {7i, . . . , Tn} be a v-tabset of multitables over schema U. The results 
of the operations projection 7r[/(i?i), product i?i x R2 and union Ri U R2 on T, 
respectively, (with Ri,R2 S S) are then defined as 

7ru{Ri){T) = {R' \ %eT,R' = MRf)} 
{Ri U R2){T) = {R' \ %& T,R! = Rf U RI'} 
{Ri x R2){T) = {R' \ %& T,R! = Rf X R^'] 

To show that v-tabsets are not strong for selection and difference we consider 
a v-tabset consisting of the following v-multitable {R, S): 
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Consider the selection oa^i {R)- The answer world-set W consists of the world 
{{A ■.1,B:2),{A:1,B: 1)} in case x = 1, and the worlds {{A ■.1,B: c)}, where 
cSD — {!}, in case x ^ 1. We prove by contradiction that there is no v-tabset 
representing precisely the world-set W. Since W is an infinite world-set and a v- 
tabset consists of only finitely many v-tables, there must be at least one v-table 
T that represents infinitely many worlds of the form {{A : 1, B : c) \ c G D} and 
rep{T) C W. Since all tuples in a world of W have 1 as a value for A, all tuples 
in T must have it too, otherwise T will represent worlds that are not in W. 
Also, to represent infinitely many worlds, T must contain at least one variable. 
Thus T consists of v-tables with tuples of the form {A : 1, B : y), where for at 
least one such tuple y is a variable. But then for y i— > 3, a v-table containing 
{A: 1,B -.y) with variable y must not contain any other tuple whose instan- 
tiation is different from {A : 1, B : 3) , as there are no worlds in W containing 
{A : 1, B : 3) and other different tuples. This implies that for y ^ 1, W has 
either a world {{A : 1, B : 1)} {in case of v-tables with one tuple {A : 1, B : y)), 
or a world {{A : 1, B : 1), (A : 1, B : 3)} (in case of v-tables with several more 
tuples) . Contradiction. 

Consider now the difference R — S. The answer world-set W' consists of the 
world {A: l,B -.2) in case x = I, and the worlds {{A : c, B : 2), {A : 1, B : c)}, 
cGD — {!}, in case a; 7^ 1. We prove by contradiction that there is no v-tabset 
representing precisely the world-set W. Using arguments similar to the above 
case of selection, the answer v-tabset consists of v-tables that have (possibly 
many) tuples of the form {{A : y, B : 2), {A : 1,B : y)}, where y is a variable for 
at least one pair of such tuples. But then, for y ^ 1, there are worlds that contain 
{{A : 1, B : 2), {A : 1, B : 1)} and these worlds are not in W. Contradiction. □ 

We will later see that, in contrast to vWSDs, gWSDs are a strong represen- 
tation system for any relational language, because they capture c-multitables 
(Theorem O. 

Remark 3. Verifying nondeterministically that a structure ^ is a possible world 
of gWSD ({Ci, . . . , Cm}, 0) is easy: all we need is choose one tuple from each 
of the component tables Ci, . . . , Cm, concatenate them into a tuple t, and check 
whether a valuation exists that satisfies and takes inline^ ^(i) to A. □ 

The vWSDs are already exponentially more succinct than the v-tabsets. As 
is easy to verify. 

Proposition 8. Any v-tabset representation of the WSD 



where the ai,bi are distinct domain values takes space exponential in n. 




By a similar argument, v(resp.,g)WSDs are exponentially more succinct than 
v(g-)TSTs. Succinct attribute-level repersentations have a rather high price: 



Theorem 1. Given an attribute-level (g)WSD W, checking whether the empty 
world is in rep(>V) is NP-complete. 

Proof. To prove this, we show that the problem is in NP for attribute-level 
gWSDs and NP-hard for attribute-level WSDs. 

Let W = ({Ci,...,C„},(^) be a gWSD. The problem is in NP since we 
can nondeterministically check whether there is a choice of component tuples 
ti G Ci, . . . ,tn E Cn such that ti o ■ ■ ■ o tn represents the empty world. 

The proof of NP-hardness is by reduction from Exact Cover by 3-Sets (X3C) 
[H]. Given a finite set X of size \X\ = 3q and a set C of three-element subsets 
of X, does C contain a subset C" such that every element of X occurs in exactly 
one member of C"? 

Construction. We construct an attribute-level WSD {Ci, . . . ,Cq} as fol- 
lows. Let Ci be a table of schema Ci[di.Ai, . . . ,d^x\-Ai] with tuples {di.Ai : 
ai, . . . ,d\x\-Ai : a\x\) for each S E C such that aj = J- if j £ S and aj — 1 
otherwise. 

Correctness. This is straightforward to show, but note that each tuple of a 
component relation contains exactly three ± symbols. The WSD represents a set 
of worlds in which each one contains, naively, up to 3 ■ g tuples. The composition 
of q component tuples wi € C'l, . . . ,Wq € Cq can only represent the empty world 
if the ± symbols in wi, ... ,Wq do not overlap. This guarantees that Wi o ■ ■ ■ o Wq 
represents the empty set only if the sets from C corresponding to wi, . . . ,Wq 
form an exact cover of X. □ 

Example 5. We give an example of the previous reduction from X3C to test- 
ing whether the empty world is in the representation of a WSD. Let X = 
{1, 2, 3, 4, 5, 6, 7, 8, 9} and let C = {{1, 5, 9}, {2, 5, 8}, {3, 4, 6}, {2, 7, 8}, {1, 6, 9}}. 
Then the WSD {Ci, Ca, Cg} with each C\ the table 
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for 1 < i < 3 represents the empty world, because every tuple di has _L symbol 
for some attributes in the result of combining the first tuple of Ci, the third 
tuple of C2, and the fourth tuple of C3. Therefore, the first, third and fourth 
sets in C are an exact cover of X. □ 

It follows that the problem of deciding whether the g-ary tuple (1, . . . , 1) or 
whether the world containing just that tuple is uncertain is NP-complete. Note 
that this NP-hardness is a direct consequence of the succinctness increase in 
gWSDs as compared to gTSTs. On gTSTs, checking for the empty world is a 
trivial operation. 



Corollary 2. Tuple certainty is coNP-hard for attribute-level WSDs. 



This problem remains in coNP even for general gWSDs. Nevertheless, since 
computing certain answers is a central task related to incomplete information, 
we will consider also the following restriction of gWSDs. As we will see, this 
alternative definition yields a representation system in which the tuple and in- 
stance certainty problems are in polynomial time while the formalism is still 
exponentially more succinct than gTSTs. 

Definition 5 (gWSD). An attribute-level gWSD is called a tuple-level gWSD 
if for any two attributes Ai,Aj from the schema of relation R, and any tuple 
id d, the attributes R.d.Ai^ R.d.Aj of the component tables are in the same 
component schema. □ 

In other words, in tuple-level gWSDs, values for one and the same tuple 
cannot be split across several components - that is, here the decomposition is 
less fine-grained than in attribute-level gWSDs. In the remainder of this article, 
we will exclusively study tuple-level (g-, resp. v-)WSDs, and will refer to them as 
just simply (g-, v-)WSDs. Obviously, tuple-level (g)WSDs are just as expressive 
as attribute-level (g)WSDs, since they all are just decompositions of l-(g)WSDs. 

However, tuple-level (g)WSDs are less succinct than attribute-level (g)WSDs. 
For example, any tuple-level WSD equivalent to the attribute-level WSD 



must be exponentially larger. Note that the WSDs of Proposition [5] are tuple- 
level. 

4 Main Expressiveness Result 

In this section we study the expressive power of gWSDs. We show that gWSDs 
and c-multitables are equivalent in expressive power, that is, for each gWSD 
one can find an equivalent c-multitable that represents the same set of possible 
worlds and vice versa. 

Theorem 2. gWSDs capture the c-multitables. 

Thus gWSDs form a strong representation system for relational algebra. 

Corollary 3. gWSDs are a strong representation system for relational algebra. 

We prove that gWSDs capture the c-multitables by providing a translation of 
gWSDs into x-multitables, a syntactically restricted form of c-multitables, and 
a translation of c-multitables into gWSDs. 




Lemma 1. Any gWSD has an equivalent x-multitable of polynomial size. 
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Fig. 5. Translating gWSDs into x-multitables: x-table (b) is equivalent to gWSD 
(a). 

Proof. Let W = ({Ci, . . . , Cm}, (/)) be a (tuple-level) m-gWSD that encodes a 
g-tabset A over relational schema (i?i[C/i], . . . ,Rk[Uk])- 

Construction. We define a translation / from W to an equivalent c-multitable 
T = {Rf, Rj, (/)^, A'^) in the following way. 

In case a component Cj of W is empty, then W represents the empty world- 
set and is equivalent to any x-multitable with an unsatisfiable global condition, 
i.e., X ^ X. We next consider the case when all components of W are non-empty. 

1. The global condition (/> of W becomes the global condition of the x- 
multitable T. 

2. For each relation schema Ri\U\ we create a table R^ with the same schema. 

3. We construct a mutex set M({a;i, . . . , Xm}, n) with — |Cj | — 1 that has 
a new variable Xj for each component Cj of W. For each local world wi G Cj 
(with 1 < i < |Cj|) we create a conjunction 



cond{wi 




fi{xj) = 
1 < i < fi{xj) 

i — fJ.{xj) + 1. 



Clearly, any valuation of Xj satisfies precisely one conjunction cond{wi). Let 
d be a tuple identifier for a relation R defined in Cj, and t be the tuple for 
d in Wi. If i is not a t_L-tuple, then we add t with local condition A^(t) to 
RJ , where Rf is the corresponding table from the x-multitable and X^{t) 
is the conjunction cond{wi). 



Example 6. Consider the 1-gWSD ({Ci}, 4>) given in Figure 5(a) The first tuple 
of Ci encodes a g-table R with a single tuple (with identifier di), and the second 
tuple of Ci encodes two v-tuples with identifiers di and d2- The encoding of the 



gWSD as an x-table T with global condition is given in Figure 5(b) The 
local conditions of tuples in are conjunctions from a mutex set M({a;i}, fi) = 
{true, [xi — 1), {xi 7^ 1)}, where /i(xi) = 1. Our translation relies on the fact 
that any valuation of the mutex variables satisfies precisely one (in)equality 
for each mutex variable. For instance, if the first tuple of would have the 
local condition xi — 2, then a valuation {.ti 2} would wrongly allow worlds 
containing instantiations of the first two tuples of T^, although this is forbidden 
by our gWSD. □ 

Correctness. Take the g-tabset A represented by W: 

m 

A { (inline"^ (u;i x . . . x Wm),(t)) \ /\ (wj £ Cj)}. 



We create a g-tabset A' that consists of the g-multitables of A with global 
conditions ^ enriched by conjunctions from our mutex set M such that precisely 
one of these conjunctions is true for any valuation of the mutex variables. We 
consider then a new global condition := (p /\cond{wi) f\ . . .Acond{wm) 

for each g-multitablc S(^^_...^.u,^) defined by inline~'^(wi x . . . x Wm) with initial 
global condition </>. 

Clearly, A' is equivalent to A, because there is a ono-to-one mapping between 
g-multitables of A and of A', respectively. A choice of a g-multitable from A, 
or any world A it represents, is then precisely mapped to its corresponding g- 
multitable from A', or world A, under an appropriate assignment of the mutex 
variables. This also holds for the other direction. 

We next show that rep(A') = rep(T). 

Any total valuation u over the mutex variables Xi, . . . , Xm is identity on (j) 
and satisfies precisely one conjunction cond{wi) A ... A cond{Wm): 

^{^') = {{{v{B{v^,...,v^)),^{^(v^,...,v^))) I 1 <i < -m^Vj e Cj}) = (B(^i,...,^^),?!'). 

Let B = (^) for short. It remains to show that rep{B) = rep{u{T)). 

(C) The translation / maps each tuple of a table Rf to an identical tuple in 
Rj, where Ri € . . . , Rk]- We also have v{4>) =0 = 0^. Thus Rf C Rj in 
each world represented by T. 

(D) Assume there is a tuple t E ^{Rj) such that t ^ Rf. The translation / 
ensures that t comes from a combination of local worlds (ci, . . . , Cm), which cor- 
responds to a g-multitable B' with global condition (f)Acond{ci) A . . .Acond{cm)- 
Wc thus have that v{cond{c\) A ... A cond{c„-i)) = true for t to be defined by 
B'. However, there is only one combination of local worlds with this property, 
namely {wi, . . . , Wm), which defines B. Contradiction. 

Complexity. By construction, the translation / is the identity for global 
conditions and maps each tuple t defined by a component of W and different 
from t± to precisely one tuple of of a table of T with local condition of polynomial 
size. The translation / is thus polynomial. □ 

For the other, somewhat more involved direction, wc first show that c- 
multitables can be translated into equivalent g-tabsets. That is, disjunction on 
the level of entire tables plus conjunctions of negated equalities as global condi- 
tions, as present in g-tablcs, arc enough to capture the full expressive power of 
c-tablcs. In particular, wc arc able to eliminate all local conditions. 

Proposition 9. Any c-multitable has an equivalent g-tabset. 

Proof. Let T — {RJ , . . . , RJ , (f>'^ , X^) be a c-multitable over relational schema 
(i?i[i7i], . . ., Rk[Uk]); <p'^ is the global condition and maps each tuple to its 
local condition. Let Xr and Dr be the set of all variables and the set of all 
constants appearing in the c-multitable, respectively. 

Construction. We construct a g-tabset G with g-multitables over the same 
schema {Ri[Ui], . . . , Rk[Uk]) as follows. Wc consider comparisons of the form 
T = t' and T ^ t' where t, t' £ Xr U Dr are variables or constants from 
the c-multitable. We compute the set of all consistent & = /\{T6r,T' t' | t, t' e 
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(b) An optimized 1-gWSD equiva- 
lent to c-table T. 
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(c) 1-gWSD equivalent to c-table T. The 0's are given here for clarity. 



Fig. 6. Translating c-tables into 1-gWSDs. 



Xr UDr} where 9r,T' € {=, 7^} for all r, r' and & 1= (f)^ . Note that the equalities 

in O define an equivalence relation on Xr U Dt- In particular, we take into 
account that c = c' is consistent iff c and c' are the same constant. We denote by 
[xi\= the equivalence class of a variable Xi with respect to the equalities given 
by O and by h{[xi\=) the representative element of that equivalent class (e.g. 
the first element with respect to any fixed order of the elements in the class) . 

For each 0, we construct a g-multitable ^© in G. Each tuple t from a table 
RJ becomes a tuple in i?f^ if 6* ^ ^'^{t)- The global condition of Qe is 0. To 
strictly adhere to the definition of g-multitables, we remove the equalities from 
and enforce them in the tables i?f®, . . . ,Rk^- In case of a tuple {xi, . . . ,Xn), 
we replace Xi by c in case c G T>t,0 1= {xi = c), and by /i([a;i]) in case Vc e 

Correctness. Clearly, the g-tabset G consists of a finite number of g-multitables, 
because the finite number of variables and constants in T induces finitely many 
consistent 0's. We next show that rep(G) = rep{T). 

(C) Given a world A represented by a g-multitable Qq C G for a conjunction 
0. For simplicity, we consider the (equivalent) multitable where the equalities are 
not removed from and also not propagated in the g-tables. By construction, 
\= (jp^ and a tuple t is in a table Rp^ if it occurs in a table such that 
1= A-^(f). Thus we necessarily have that A € rep{T). 



(3) Given a world A E rep{v{'T)) defined by a total valuation v consistent 
with (jp' . Because v and talk about the same set of variables and there is a 
O for each possible (in)equality on any two variables or variable and constants 
that are consistent with 0^, there exists a consistent such that 6? N z^. Let 
Qq be the g-multitable in G for our chosen 0. Take now any tuple t in a table 

such that [t)) — true. Then, because \= v we have N A^(i) and 
teR^"^. Thus A e repiGe) C rep(G). □ 

Any g-tabset can be inlined into a g-TST, which, by the definition of gWSDs, 
represents a 1-gWSD. It then follows that 

Lemma 2. Any c-multitable has an equivalent gWSD. 



Example 7. Figure 6(a) shows a c-table T. Following the construction from the 
proof of Proposition [91 we create nine consistent 0's and one g-table for each 



of them. Figure 6(c) shows the 6''s and an inlining of all these g-tables into a 
gTST. The gTST is normalized by creating one common global condition. This 
gTST with a global condition of inequalities is in fact a 1-gWSD. Figure [6(b)| 
shows a simplified version of our 1-gWSD, where duplicate tuples are removed 
and some different tuples are merged. For instance, the tuple for 6*4 is equal to 
the tuple for 0i and can be removed. Also, by merging the tuples for 02 and 
03 we also allow y to take value 1 and thus we eliminate the inequality y 7^ 1 
form the global condition (p. □ 

As a corollary of Lemma[T] x-multitables, a syntactically restricted form of c- 
multitables, are at least as expressive as gWSDs. However, by Lemma[2l gWSDs 
are at least as expressive as c-multitables. This implies that 

Corollary 4. The x-multitables capture gWSDs and thus c-multitables. 

To sum up, we can chart the expressive power of the representation systems 
considered in this paper as follows. As discussed in Section |3l v-multitables 
are less expressive than finite sets of v-multitables (or v-tabsets), which are 
syntactic variations of vTSTs. The vWSDs (resp., gWSDs) are equally expressive 
to v(g)TSTs yet exponentially more succinct (Proposition [5]) . The gWSDs are 
more expressive than vWSDs because gWSDs can represent the answers to any 
relational algebra query, whereas vWSDs cannot represent answers to queries 
with selections or difference. Finally, c-multitables are captured by their syntactic 
restriction called x-multitables and also by gWSDs. 

5 Complexity of Managing gWSDs 

We consider the data complexity of the decision problems defined in Section [TJ 
Note that in the literature the tuple (q-)possibility and (q-)certainty problems 
are sometimes called bounded or restricted (q-)possibility, and (q-)certainty re- 
spectively, and the instance (q-)possibility and (q-)certainty are sometimes called 
(q-)membership and (q-)uniqueness [3]. A comparison of the complexity results 
for these decision problems in the context of gWSDs to those of c-tables [3] and 
Trio [8] is given in Table [2 
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(a) 3-gWSD W = ({Ci, C2, C3}, true) 
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Fig. 7. Example of a 3-gWSD and an equivalent x-multitable. 

5.1 Tuple (q)-possibility 

We first prove complexity results for tuple q-possibility in the context of x-tables. 
This is particularly relevant as gWSDs can be translated in polynomial time into 
x-tables, as done in the proof of Lemma [T] 

Lemma 3. Tuple q-possibility is in PTIME for x-tables and positive relational 
algebra. 

Proof. Recall from Definition and Proposition [3] that x-tables are closed un- 
der positive relational algebra and the evaluation of positive relational algebra 
queries on x-tables is in PTIME. 

Consider a constant tuple t and a fixed positive relational query Q, both over 
schema U, and two x-multitables T and T' such that T' — Q{T). 

In case the global condition of T' is unsatisfiable, then T' represents the 
empty world-set and t is not possible. The global condition is a conjunction of 
negated equalities and we can check its unsatisfiability in PTIME. We consider 
next the case of satisfiable global conditions. Following the semantics of x-tables, 
the tuple t is possible in T' iff there is a tuple t' in T' and a valuation v consistent 
with the global and local conditions such that t' equals t under i/. This can be 
checked for each T'-tuple individually and in PTIME. □ 

Theorem 3. Tuple q-possibility is in PTIME for gWSDs and positive relational 
algebra. 

Proof. This follows from the polynomial time translation of gWSDs into x- 
multitables ensured by Lemma [T] and the PTIME result for x-multitables given 
in Lemma 131 □ 

For full relational algebra, tuple q-possibility becomes NP-hard even for v- 
tables where each variable occurs at most once (also called Codd tables) [3] . 

Theorem 4. Tuple q-possibility is in NP for gWSDs and relational algebra and 
NP-hard for WSDs and relational algebra. 

Proof. Tuple q-possibility is in NP for gWSDs and relational algebra because 
gWSDs can be translated polynomially into c-tables (see Lemma [1]) and tuple 
q-possibility is in NP for c-tables and relational algebra [3] . 

We show NP-hardness for WSDs and relational algebra by a reduction from 
SCNF-satisfiability [15]. Given a set Y of propositional variables and a set of 
clauses Ci — ct^i Vci,2 Vq^s such that for each i, k, Ci^k is x or -ix for some x G Y, 



the 3CNF-satisfiability problem is to decide whether there is a satisfying truth 
assignment for /\^ q. 

Construction. We create a WSD W = (Ci, . . . , C|y|, C's) representing 
worlds of two relations R and S over schemas R{C) and S{C), respectively, 
as foUowfl For each variable Xi in Y we create a component Ci with two lo- 
cal worlds, one for Xi and the other for -ix^. For each literal Ci^k we create an 
i?-tuple (i) with id di^k- In case Ci^k — Xj or a^k — ~'Xj, then the schema of Cj 
contains the attribute R.di^k-C and the local world for Xj or -iXj, respectively, 
contains the values (i) for these attributes. All component fields that remained 
unfilled are finally filled in with _L-values. The additional component Cs has n 
attributes S.di.C, . . . , S.dn-C and one local world (1, . . . , n). Thus, by construc- 
tion, = {(C : 1), . . . , (C : n)} and i? C S' in all worlds defined by W. 

The problem of deciding whether /\^ Ci has a satisfying truth assignment is 
equivalent to deciding whether the nuUary tuple () is possible in the answer to 
the fixed query Q — {()} — 7r0(5' — i?), with S and R defined by W. 

Correctness. Clearly, () is possible in the answer to Q iff there is a world 
A G rep(yV) where 7r0(S' — R) is empty, or equivalently S* — i? is empty. Because 
by construction R C S in all worlds defined by W, we further refine our condition 
to 3 A e rep(W) : S-^ = R-^. We next show that /\- a has a satisfying truth 
assignment exactly when 3^ G rep{W) : S'-^ = R-^. 

First, assume there is a truth assignment of Y that proves /\ - Ci is satisfi- 
able. Then, v{ci) is true for each clause Because each clause Ci is a disjunction, 
this means there is at least one Ci^k for each q such that v{ci^k) is true. 

Turning to W, v represents a choice of local worlds of W such that for each 
variable Xj G Y if v{xj) — true then we choose the first local world of Cj and 
if v{xj) = false then we choose the second local world of Cj. Let Wj be the 
choice for Cj and let wcs be the only choice for Cs- Then, W defines a world 
A = inline^ ^(wi x . . . x w\-y\ x wcs) ^^'^ contains those tuples defined in 
the chosen local worlds. Because there is at least one Ci^k per clause q such that 
^{ci^k) is true^ there is also a local world Wj that defines i?-tuple (C : i) for each 
Ci. Thus R-^^S^. 

Now, assume there exists a world A £ repiyV) such that = R-^. Thus 
i?"^ = {(C : 1), . . . , (C : n)} and there is a choice of local worlds of the compo- 
nents in W that define all i?-tuples (C : 1) through (C : n). By construction, 
this choice corresponds to a truth assignment v that maps at least one literal 
Ci,k of each Ci to true. Thus v is a satisfying truth assignment of /\iCi. □ 

The construction used in the proof of Theorem|4]can be also used to show that 
instance possibility is NP-hard for (tuple- level) WSDs: deciding the satisfiability 
of 3CNF is reducible to deciding whether the relation {{C : I), . . . , (C : n)} is a 
possible instance of R. 

Example 8. Figure [5] gives a 3CNF clause set and its WSD encoding. Checking 
the satisfiability of ci A C2 A C3 is equivalent to checking whether there is a choice 



For clarity reasons, we use two relations; they can be represented as one relation 
with an additional attribute stating the relation name. 



3CNF clause set: {ci = ti V a;2 V xs, 



C2 = XiV -1X2 V X4, C3 — -^xi V 2:2 V -^Xi} 
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Fig. 8. 3CNF clause set encoded as WSD. 



of local worlds in the WSD such that () is possible in the answer to the query 
{()} " — R), or, simpler, such that S* ~ i? is empty. This also means that 
R = {{C : 1),(C : 2),(C : 3)}. For example, {a;i i— > true,X2 i— > true,X3 i-^ 
true,Xii t-^ true} is a satisfying truth assignment. Indeed, the corresponding 
choice of local worlds (Ci : xi, C2 ■ 2^2, C's : 2:3, C4 : x^, Cs ■ wcs) defines a world 
A in which R-^ ^S-^. a 

The result for tuple possibility follows directly from Theorem [3l where the 
positive relational query is the identity. 

Theorem 5. Tuple possibility is in P TIME for gWSDs. 

Recall from Table [2] that tuple possibility is NP-complete for c-tables. This 
is because deciding whether a tuple is possible requires to check satisfiability of 
local conditions, which can be arbitrary Boolean formulas. 



5.2 Instance (q)-possibility 

Theorem 6. Instance possibility is in NP for gWSDs and NP-hard for WSDs. 

Proof Let W = ({Ci, . . . , C„}, ^) be a gWSD. The problem is in NP since 
we can nondeterministically check whether there is a choice of tuples ti G 
Ci, . . . ,tn € Cn such that ti o ■ ■ ■ o tn represents the input instance. 

We show NP-hardness for WSDs with a reduction from Exact Cover by 3-Sets 

m- 

Given a set X with \X\ = 3q and a collection C of 3-element subsets oi X, the 
exact cover by 3-sets problem is to decide whether there exists a subset C C C, 
such that every element of X occurs in exactly one member of C . 

Construction. The set X is encoded as an instance consisting of a unary 
relation Ix over schema with 3q tuples. The collection C is represented 

as a WSD W — {Ci, . . . , Cq} encoding a relation R over schema R[A], where 
Ci, . . . ,Cq are component relations. The schema of a component Ci is 
Ci[R.dj+i.A, R.dj+2-A, R.dj+3.A],wheTe j = [|J . Each 3-element set c = {x,y,z} 
£ C is encoded as a tuple (x, y, z) in each of the components Ci. 
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Fig. 9. Exact cover by 3-sets encoded as WSD. 



The problem of deciding whether there is an exact cover by 3-sets of X is 
equivalent to deciding whether Ix G rep(W). 

Correctness. We prove the correctness of the reduction, that is, we show 
that X has an exact cover by 3-sets exactly when Ix £ rep(W). 

First, assume there is a world A £ rep(W) with — Ix- Then there exist 
tuples Wi G Ci,l < i < q, such that A = rep{{wi} x . . . x {wq}). As Ix and R-^ 
have the same number of tuples and all elements of Ix are different, it follows 
that the values in wi, ... ,Wq are disjoint. But then this means that the elements 
in wi , . . . , Wg are an exact cover of X. 

Now, assume there exists an exact cover C = {ci, . . . , Cq} of X. Let Wi G Cj 
such that = Ci, 1 < i < q. As the elements Ci are disjoint, the world A = 
rep{{wi} X ... X {wq}) contains exactly 3g tuples. Since C" is an exact cover 
of X and each element of X (and therefore of Ix) appears in exactly one local 
world Wi, it follows that Ix = R'^- □ 



Example 9. Consider the set X and the collection of 3-elenient sets C defined as 
X = {1,2,3,4,5,6,7,8,9} 

C = {{1, 5, 9}, {2, 5, 8}, {3, 4, 6}, {2, 7, 8}, {1, 6, 9}} 

The encoding of X and C is given in Figure [Has WSD W and instance Ix- 
A possible cover of X, or equivalently, a world of rep(W) equivalent to Ix, is 
the world inline"^ (wi o W3 o W/Cj or, by resolving the record composition, 

inline^^(ti.A : I,t2-A : 6,^3.^ : 9,^4-^ : 3,^5.^1 : 4, : ^ : 6, : 2, : 7,^9.^1 : 8). 

□ 

Theorem 7. Instance q-possibility is NP-complete for gWSDs and relational 
algebra. 

Proof. For the identity query, the problem becomes instance possibility, which is 
NP-complete (see Theorem[S]). To show it is in NP, we use the PTIME reduction 
from gWSDs to c-tables given in Lemma [Hand the NP-completeness result for 
instance q-possibility and c-tables [3]. □ 



5.3 Tuple and instance certainty 



Theorem 8. Tuple certainty is in PTIME for gWSDs. 

Proof. Consider a tuple- level gWSD W = ({Ci, . . . , Cm}, <!)) and a tuple t. Tuple 
t is certain exactly if (j) is unsatisfiable or there is a component Ci such that each 
tuple of Ci contains t (without variables): Suppose </> is satisfiable and for each 
component Ci there is at least one tuple Wi G Ci that does not contain t. Then 
there is a world-tuple u; G Ci x • • ■ x Cm such that tuple t does not occur in 
w. If there is a mapping that maps some tuple in w to t and for which 0((j)) 
is true, then there is also a mapping 9' such that 9'{w) does not contain t but 
9'{4>) is true. Thus t is not certain. □ 

As shown in Table [2l tuple certainty is coNP-complete for c-tables, as it 
requires to check tautology of local conditions, which can be arbitrary Boolean 
formulas. 

Theorem 9. Instance certainty is in PTIME for gWSDs. 

Proof. Given an instance / and a gWSD W representing a relation R, the prob- 
lem is equivalent to checking for each world A £ rep{W) whether (1) / C R-^ 
and (2) R-^ C /. Test (1) is reducible to checking whether each tuple from / is 
certain in R, and is thus in PTIME (cf. Theorem[5|). For (2), we check in PTIME 
whether there is a tuple different from t± in some world of rep{W) that is not in 
the instance /. If W has variables then it cannot represent certain instances. □ 

5.4 Tuple and instance q-certainty 

Theorem 10. Tuple and instance q-certainty are in coNP for gWSDs and re- 
lational algebra and coNP-hard for WSDs and positive relational algebra. 

Proof. Tuple and instance q-certainty are in coNP for gWSDs and full rela- 
tional algebra because gWSDs can be translated polynomially into c-tables (see 
Lemma H]) and tuple and instance q-certainty are in coNP for c-tables and full 
relational algebra [3J. 

We show coNP-hardness for WSDs and positive relational algebra by a re- 
duction from 3DNF-tautology PTS] . Given a set Y of propositional variables and 
a set of clauses q — Ci^i A Ci_2 A Cj.s such that for each j, k, Ci^k is x or -^x for 
some X S Y, the 3DNF-tautology problem is to decide whether d is true for 
each truth assignment of Y. 

Construction. We create a WSD W = (Ci, . . . , C|y|) representing worlds 
of a relation R over schema R{C, P) as follows. For each variable Xi in Y we 
create a component Ci with two local worlds, one for Xi and the other for -ix^. 
For each literal Ci^k we create an i?-tuple (i, k) with id di^k- In case Ci^k — xj or 
Ci,fe = ~'Xj, then the schema of Cj contains the attributes R.di^k-C and R.di^k-P, 
and the local world for xj or ^Xj, respectively, contains the values {i, k) for these 
attributes. All component fields that remained unfilled are finally filled in with 
_L- values. 



3DNF clause set: {ci = a;i A a;2 A 3:3, C2 — xi /\ -^X2 A 2:4, 03 = -^xi Ax2 A -1X4} 
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Fig. 10. 3DNF clause set encoded as WSD. 



The problem of deciding whether \J ^ Ci is a tautology is equivalent to decid- 
ing whether the nuUary tuple () is certain in the answer to the fixed positive 
relational algebra query Q := Trf^{a^{{R ri) x {R 7-2) x {R r^))), where 

(j) := (ri.C = r2-C and ri.C = r^.C and ri.P = 1 and r2-P — 2 and r^.P = 3). 

Correctness. We prove the correctness of the reduction, that is, we show 
that Vi Ci is a tautology exactly when € rep{W) : () £ Q^. 

First, assume there is a truth assignment v of Y that proves \/- Ci is not a 
tautology. Then, there exists a choice of local worlds of W such that for each 
variable a;^ G Y if i'{xi) = true then we choose the first local world of Ci and 
if i'{xi) = false then we choose the second local world of Cj. Let Wj be the 
choice for d. Then, W defines a world A = inline~^(wi x . . . x W|y|) and i?^ 
contains those tuples defined in the chosen local worlds. If v proves \/j Ci is not 
a tautology, then J^(Vi c^) is false and, because \/j Cj is a disjunction, no clause 
Ci is true. Thus i?'^ does not contain tuples (i, 1), (i, 2), and (i, 3) for each clause 
Ci. This means that the condition of Q cannot be satisfied and thus the answer 
of Q is empty. Thus the tuple () is not certain in the answer to Q. 

Now, assume there exists a world A £ rep{W) such that () ^ Q-^. Then, 
R-^ contains no the set of three tuples («,2), and («,3) for any clause Ci, 

because such a triple satisfies the selection condition. This means that the choice 
of local worlds of the components in W correspond to a valuation v that does not 
map all c^^i, Ci^2, and c^.s to true, for any clause q. Thus Vi is not a tautology. 

Because by construction Q-^ is either {} or {()} for any world A £ rep{W), 
the same proof also works for instance q-certainty with instance {()}. □ 

Example 10. Figure [TOl gives a 3DNF clause set and its WSD encoding. Check- 
ing tautology of iJ ;~ ci V C2 V C3 is equivalent to checking whether the nuUary 
tuple is certain in the answer to the query from the proof of Theorem 1101 For- 
mula H is not a tautology because it becomes false under the truth assign- 
ment {xi I— > true,X2 I— > true,X3 1— > false, Xi ^ true}. This is equivalent to 
checking whether the nullary tuple is in the answer to our query in the world 
A defined by the first local world of Ci (encoding xi 1-^ true), the first lo- 
cal world of C2 (encoding X2 >— > true), the second local world of C3 (encoding 



^3 '-^ false), and the first local world of C4 (encoding X4 1-^ true). The relation 
R-^ is {{C : 1, P : 1), (C : 2, P : 1), (C : 1, P : 2), (C : 3, P : 1), (C : 2, P : 3)} and 
the query answer is empty. □ 



6 Optimizing gWSDs 

In this section we study the problem of optimizing a given gWSD by further 
decomposing its components using the product operation. Wc note that product 
decomposition corresponds to the new notion of relational factorization. We de- 
fine this notion and study some of its properties, like uniqueness and primality 
or minimality in the context of relations without variables and the special _L 
symbol. It turns out that any relation admits a unique minimal factorization, 
and there is an algorithm, called prime-factorization, that can compute it effi- 
ciently. We then discuss decompositions of (g)WSD components in the presence 
of variables and the _L symbol. 

6.1 Prime Factorizations of Relations 

Definition 6. Let there be schemata R[U] and Q[U'] such that CU' CU. A 
factor of a relation R over schema R[U] is a relation Q over schema Q[U'] such 
that there exists a relation R' with R = Q x R'. 

A factor Q of P is called proper, \i Q ^ R. A factor Q is prime, if it has no 
proper factors. Two relations over the same schema are coprime, if they have no 
common factors. 

Definition 7. Let P be a relation. A factorization of P is a set {Ci, . . . ,C„} 
of factors of P such that P = Ci x . . . x C„. 

In case the factors Ci, . . . , C„ are prime, the factorization is said to be prime. 
From the definition of relational product and factorization, it follows that the 
schemata of the factors Ci , . . . , C„ are a disjoint partition of the schema of P. 

Proposition 10. For each relation a prime factorization exists and is unique. 

Proof. Consider any relation P. Existence is clear because P admits the factor- 
ization {P}, which is prime in case P is prime. 

Uniqueness is next shown by contradiction. Assume P admits two different 
prime factorizations {nu-^ (P), . . . , irUm (R)} ^nd {ttvi (P), • • • , t^v^ Since the 
two factorizations are different, there are two sets Ui,Vj such that Ui ^ Vj 
and UinVj 7^ 0. But then, as of course P = nu-VjiR) x T^VjiR), we have 
Tru,(R) = TTUiiT^u-VjiR) X TrVj{R)) = iTUi-v,{R) x ■nu,nVj{R)- It follows that 
{ttc/i (P), . . . , 7r[/,_i {R),T:Ui-Vj (P), Try.ny, (P), ttc/^+i (P), • • • , T^Um (^)} is a factor- 
ization of P, and the initial factorizations cannot be prime. Contradiction. □ 



algorithm prime-factorization {S) 

II Input: Relation S over schema S'[t/]. 

// Result: Prime factorization of 5 as a set Fs of its prime factors. 

1. Fs~{{ixB{S)-}\B^\J,\^B{S)\^\\,S~S-r n {F)- 

FeFs 

2. if S' = then return Fs; 

3. choose any A G sch(5'),?; G 7rA(S') such that |crA=u(5')| < \o-A^viS)\; 

4. Q ~ aA=v{S); R ~ aA^v{S); 

5. foreach F G prime- factorization (Q) do 

6. if {R^F)x F = R then Fs ■- Fs U {F}; 

7. if n (F) ^ S then Fs ■- Fs U {S ^ U (F)}; 

FeFa FeFa 

8. return Fs; 



Fig. 11. Computing the prime factorization of a relation. 



6.2 Computing Prime Factorizations 

This section first gives two important properties of relational factors and factor- 
izations. Based on them, it further devises an efficient yet simple algorithm for 
computing prime factorizations. 

Proposition 11. Let there be two relations S and F, an attribute A of S and 
not of F, and a value v G tta{S). Then, for some relations R, E, and I holds 

S = F xR^ cjA=v{S) ^ F X E and aA^v{S) ^ F x I. 

Proof. Note that the schemata of F and R represent a disjoint partition of the 
schema of S and thus A is an attribute of R. 

Relation F is a factor of a-A=v{S) because 

(TA=viS) = C7A=viF X R) = F X (JA=v{R)- 

Analogously, F is a factor of (Ja=^v{S)- 
<=. Relation F is a factor of S because 

S = aA=v{S)^aAMS) ^FxEUFxI^Fx{EUl). □ 
Corollary 5. A relation S is prime iff (Ja=v{S) and aA^v{S) are coprime. 



The algorithm prime- factorization given in Figure [TT] computes the prime fac- 
torization of an input relation S as follows. It first finds the trivial prime factors 
with one attribute and one value (line 1). These factors represent the prime fac- 
torization of S, in case the remaining relation is empty (line 2). Otherwise, the 
remaining relation is disjointly partitioned in relations Q and R (line 4) using 
any selection with constant A = v such that Q is smaller than R (line 3). The 
prime factors of Q are then probed for factors of R and in the positive case 



become prime factors of S (lines 5 and 6). This property is ensured by Propo- 
sition [TT] The remainder of Q and R, which does not contain factors common 
to both Q and R, becomes a factor of S (hne 7). According to Corollary [51 this 
factor is also prime. 

Example 11. We exemplify our prime factorization algorithm using the following 
relation S with three prime factors. 
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To ease the explanation, we next consider all variables of the algorithm followed 
by an exponent i, to uniquely identify their values at recursion depth i. 

Consider the sequence of selection parameters {A, ai), {D, di), {E, ei). 

The relation has no factors with one attribute. We next choose the selec- 
tion parameters (A, ai). The partition = aA=ai{S^) and R^ — aA=iai{S^) is 
shown below 
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We proceed to depth two with — Q^. We initially find the prime factors with 
one of the attributes A, B, and C. We further choose the selection parameters 
{D,di) and obtain and R^ as follows 
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We proceed to depth three with — Q^. We initially find the prime factor with 
the attribute D. We further choose the selection parameters {E,ei) and obtain 
and R^ as follows: 





E 


i?3 


E 




ei 


62 



We proceed to depth four with S"* — Q^. We find the only prime factor tte{Q^) = 
with the attribute E and return the set {Q^}- 

At depth three, we check whether is also a factor of R^. It is not, and we 
infer that U i?^ is a prime factor of (the other prime factor tid{Q'^) was 
already detected). We thus return {t^d{Q^)t1^e{Q^)}- 

At depth two, we check the factors of for being factors of E? and find that 
t^e(.Q^) is also a factor of i?^, whereas ttd{Q^) is not. The set of prime factors of 
gi is thus {^i5(g2),7r^(Qi),7rB(Qi),7rc(Q^),7rz5(gi)}, where tta{Q^), 7Tb{Q^), 
and TTciQ^) were already detected as factors with one attribute and one value, 
and ttd{Q^)} is the rest of Q^. 

At depth one, we find that only tte^Q^) and ttd{Q^)} are also factors of R^. 
Thus the prime factorization of 5*^ is {tte{Q'^),t^d{Q^),t^a,b,c{S^)}- The last 
factor is computed in line 7 by dividing to the product of the factors iTEiQ^) 
and 7r£,(gi)}. □ 



Remark 4- It can be easily verified that choosing another sequence of selection 
parameters, e.g., {D, di), {E, ei) and {A, ai), does not change the output of the 
algorithm. 

Because the prime factorization is unique, the choice of the attribute A and 
value V (line 3) can not influence it. However, choosing A and v such that 
\(TA=viS)\ < |cry!i5^t,(>5')| ensures that with each recursion step the input relation 
to work on gets halved. This affects the worst-case complexity of our algorithm. 

In general, there is no unique choice of A and v that halve the input relation. 
There are choices that lead to faster factorizations by minimizing the number of 
recursive calls and also the sizes of the intermediary relations Q. □ 

Theorem 11. The algorithm of Fiaure \ll\ computes the prime factorization of 
any relation. 

Proof. The algorithm terminates, because (1) the input size at the recursion 
depth i is smaller (at least halved) than at the recursion depth i — 1, and (2) the 
initial input is finite. 

We first show by complete induction on the recursion depth that the algo- 
rithm is sound, i.e., it occasionally computes the prime factorization of the input 
relation. 

Consider d the maximal recursion depth. To ease the rest of the proof, we 
uniquely identify the values of variables at recursion depth i (1 < i < d) by an 
exponent i. 

Base Case. We show that at maximal recursion depth d the algorithm com- 
putes the prime factorization. This factorization corresponds to the case of a 
relation S'^ with a single tuple (line 2), where each attribute induces a prime 
factor (line 1). 

Induction Step. We know that Fs^^^ represents the prime factorization of 
— g* and show that Fs"^ represents the prime factorization of 5*. 



Each factor F of is first checked for being a factor of i?* (fines 5 and 6). 
This check uses the definition of relational division: the product of F and the 
division of i?* with F must give back R^. Using Proposition [11] each factor F 
common to W and is also a factor of S^. Obviously, because F is prime in 
Q\ it is also prime in S^. 

We next treat the case when the factors common to and i?* do not entirely 
cover (line 7). Let P be the product of all factors common to Q' and W, i.e., 
P = nFs\ Then there exists Qi and Rl such that = Q\ x P and i?* = i?^ x P. 
It follows that S'* = u i?* = (Ql U j x ^, thus {Q\ U i?^) is a factor of S\ 
Because Q\ and i?* are coprime (otherwise they would have a common factor). 
Corollary [5] ensures that their union (Q* Ui?*) is prime. 

This concludes the proof that the algorithm is sound. The completeness fol- 
lows from Proposition [11] which ensures that the factors of 5* , which do not 
have the chosen attribute A, are necessarily factors of both and i?' at any 
recursion depth i. Additionally, this holds independently of the choice of the 
selection parameters. □ 

Our relational factorization is a special case of algebraic factorization of 
Boolean functions, as used in multilevel logic synthesis '111. In this light, our 
algorithm can be used to algebraically factorize disjunctions of conjunctions of 
literals. A factorization is then a conjunction of factors, which are disjunctions 
of conjunctions of literals. This factorization is only algebraic, because Boolean 
identities (e.g., x ■ x — x) not make sense in our context and thus are not 
considered (Note that Boolean factorization is NP-hard, see e.g., jUj). 

The algorithm of Figurc [TT] computes prime factorizations in polynomial time 
and linear space, as stated by the following theorem. 



Theorem 12. The 'prime factorization of a relation S with arity m and size n 
is computable in time 0{m ■ n ■ logn) and space 0{n + m ■ logn). 

Proof. The complexity results consider the input and the temporary relations 
available in secondary storage. 

The computations in lines 1, 3, 4, and 7 require a constant amount of scans 
over 5*. The number of prime factors of a relation is bounded in its arity. The 
division test in line 6 can be also implemented as 

T^sch{P){R) = P and \P\ ■ \Trsch(R)-sch(P){R)\ = \R\- 

(Here sc/i maps relations to their schemata). This requires to sort P and sch{P){R) 
and to scan R two times and P one time. The size of P is logarithmic in the size 
of Q, whereas Q and R have sizes linear in the size of S. The recursive call in 
line 5 is done on Q, whose size is at most a half of the size of S. 



The recurrence relation for the time complexity is then (for sufficiently large 
constant a; n is the size of S and m is the arity of S) 

Tin) = 7n + m ■ n ■ \ogn + t(— 



[log n] 

< T'{n) = a • m • n • logn + T'Q) = a • m • ^ J . fog 

1=1 

oo 

< a ■ m ■ ^^2? ' ^'^^ (2*) ~ ^ ■ ™ ' ' logn = 0(m • n ■ logn). 



Each factor of S requires space logarithmic in the size of S. The sum of the 
sizes of the relations Q and R is the size of S. Then, the recurrence relation for 
the space complexity is (n is the size of S and m is the arity of S) 

[log n] 

= n + m-logn + S'd) = ^ (J+TO-log(J)) 

1=1 

00 

= 0(n + m- logn). 

i=l 

□ 

We can further trade the space used to explicitly store the temporary rela- 
tions Q, R, and the factors for the time needed to recompute them. For this, the 
temporary relations computed at any recursion depth i are defined intentionally 
as queries constructed using the chosen selection parameters. This leads to a 
sublinear space complexity at the expense of an additional logarithmic factor for 
the time complexity. 

Proposition 12. The prime factorization of a relation S with arity m and size 
n is computable in time 0{m ■ n ■ log^ n) and space 0{m ■ logn). 

Proof. We can improve the space complexity result of Theorem [12] in the follow- 
ing way. The temporary relations computed at any recursion depth i are defined 
intentionally as queries constructed using their schema, say and the chosen 
selection parameters {A^,v^). 

The relation at recursion depth j < i is 

Q^ =^y.(a^«(^)), ^f= /\ iA'=v') 

i<i<j 

The relation R^ is defined similarly and their factors additionally require to 
only store their schema. Such queries have the size bounded in the maximal 
recursion depth, thus in the logarithm of the input relation size. At each recursion 
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Fig. 12. Non-unique decompositions of attribute-level WSDs with _L symbols. 

depth, only an attribute- value pair needs to be stored. Thus the space complexity 
becomes {n is the size of S and m is the arity of S) 



S{n, to) = TO ■ log n + ^(^'2^ ~ 



flog n] 00 

X ^ n x ^ n 

< > TO ■ log — < 2_^m • log — = TO • log n. 



1=1 



i=l 



The time complexity increases, however. All temporary relations need to be 
recomputed from the original relation S seven times at each recursion depth. 
Thus, in contrast to T{n,m) from the proof of Theorem [T2l the factor — does 
not appear in the new formula of Tin'). The new recurrence function for T{n') 
(for sufficiently large a > 0; n is the size of the initial S and m is the arity of 
the initial S; n' is initially n) is 



Tin') = 7n + m ■ n ■ log 71 + ^(^"^) 



, [log "1 

< T'(n') = a ■ m ■ n ■ \ogn + = a ■ m ■ n ■ logn 



a ■ m ■ n ■ Ioe 



n 



0{m ■ n ■ log^ n). 



□ 



Remark 5. An important property of our algorithm is that it is polynomial in 
both the arity and the size of the input relation S. If the arity is considered 
constant, then a trivial prime factorization algorithm (yet exponential in the ar- 
ity of S) can be devised as follows: First compute the powerset PS{U) over 
the set U of attributes of S. Then, test for each set U' G PS{U) whether 
TTiji (S) X TTu-u' (S) = S holds. In the positive case, a factorization is found 
with factors Trjj'iS) and tt [/_[/' (S*), and the same procedure is now applied to 
these factors until all prime factors are found. Note that this algorithm requires 
time exponential in the arity of the input relation (due to the powerset con- 
struction). Additionally, if the arity of the input relation is constant, then the 
question whether a relation S is prime (or factorizable) becomes FO-expressible 
(also supported by the space complexity given in Proposition 1 12p . □ 



6.3 Optimization Flavors 

The algorithm for relational prime factorization can be applied to find max- 
imal decompositions of (g)WSD components, i.e., minimal representations of 
(g)WSDs. Differently from the relational case, however, the presence of the _L 
symbol and of variables may lead to non-uniqueness and even to non-primality 
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Fig. 13. Equivalent maximal decompositions of tuple-level gWSDs {x and y are 
variables, the global condition is true). 

of the (g)WSDs factors produced by our algorithm. As Figure [T2l shows, the _L 
symbol is one reason for non-unique maximal decompositions of attribute-level 
WSDs. 

Fortunately, the tuple-level WSDs have maximal decompositions that are 
unique modulo the representation of ij_-tuples and can be efhciently computed 
by a trivial extension of our algorithm with the tuple- level constraint. Recall 
that any tuple {Ai : ai, . . . , An : a,i), where at least one is _L, is a ij_-tuple. 

Proposition 13. Any tuple-level WSD has a unique maximal decomposition. 

Proof. Let W ~ {Ci, . . . , C„} be a tuple-level WSD over schema {Ri[Ui], . . ., 
Rk[Uk]), where U, = {A], . . . ,A;^). 

Construction. We define a translation / that maps each component Cj of 
W to an ordinary relation Sd by compactifying each tuple t of schema Rj.d.Uj 
defined by Ci into one value (t) of schema Rj.d.{Uj), where (Uj) is an attribute. 
We map all i^-tuplcs defined by Ci, to the ± symbol. We can now apply the 
algorithm prime-factorization, where the _L symbol is treated as constant. 

Correctness. We show that there is an equivalence modulo our transla- 
tion between maximal decompositions of Sd and of Ci. Let {Pi, . . . ,P/} and 
{P{, . . . be maximal decompositions of Ci and Sd, respectively. Because 
of our tuple- level constraint, each tuple identifier occurs in the schema of ex- 
actly one Pj and Pj. We show that I = /' and f{Pj) is in P{, . . . ,Pj modulo the 
representation of i j^-tuples (which does not change the semantics of W) . 

Assume I' > I. Then, there exist two identifiers di and d2, whose tuples are 
defined in different components of Sd and the same component of Ci. If di 
and c?2 have no _L-values, then we are in the case of ordinary relations and the 
algorithm would have found the same decomposition for Ci and Sd ■ A _L-value 
for one of them cannot influence the values for the other and thus by treating _L 
as a constant, our algorithm would have found again the same decomposition. 
Contradiction. We thus have I = I' and the tuples t of an identifier d are defined 
by a component Pj of C; iff f{t) is defined by a Pj of Sd ■ The case oi I > I' can 
be shown similarly. □ 

The variables are a source of hardness in finding maximal decompositions 
of tuple-level gWSDs. By freezing variables and considering them constant, the 
three decompositions given in Figure [13] cannot be found by our algorithm. 

The gWSD optimization discussed here is a facet of the more general problem 
of finding minimal representations for a given g-tabset or world-set. To find a 
minimal representation for a given g-tabset A, one has to take into account all 
possible inlinings for the g-tables of A in g-tabset tables. Recall from Section [3] 



that we consider a fixed arbitrary inlining order of tlie tuples of the g-tables in A. 
Such an order is supported by common identifiers of tuples from different worlds, 
as maintained in virtually all representation systems |19I3I17|5] and exploited 
in practitioner's representation systems such as |8|4j . We note that when no 
correspondence between tuples from different worlds has to be preserved, smaller 
representations of the same world-set may be possible. 
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